Overview

Dataset statistics

Number of variables15
Number of observations5851
Missing cells576
Missing cells (%)0.7%
Duplicate rows2
Duplicate rows (%)< 0.1%
Total size in memory685.8 KiB
Average record size in memory120.0 B

Variable types

Categorical7
Numeric8

Warnings

locationcountry has constant value "US" Constant
Dataset has 2 (< 0.1%) duplicate rowsDuplicates
locationcity has a high cardinality: 971 distinct values High cardinality
vehiclemake has a high cardinality: 54 distinct values High cardinality
vehiclemodel has a high cardinality: 526 distinct values High cardinality
renterTripsTaken is highly correlated with reviewCountHigh correlation
reviewCount is highly correlated with renterTripsTakenHigh correlation
Mean is highly correlated with Median and 1 other fieldsHigh correlation
Median is highly correlated with Mean and 1 other fieldsHigh correlation
Stdev is highly correlated with Mean and 1 other fieldsHigh correlation
renterTripsTaken is highly correlated with reviewCountHigh correlation
reviewCount is highly correlated with renterTripsTakenHigh correlation
Mean is highly correlated with Median and 1 other fieldsHigh correlation
Median is highly correlated with Mean and 1 other fieldsHigh correlation
Stdev is highly correlated with Mean and 1 other fieldsHigh correlation
renterTripsTaken is highly correlated with reviewCountHigh correlation
reviewCount is highly correlated with renterTripsTakenHigh correlation
Mean is highly correlated with Median and 1 other fieldsHigh correlation
Median is highly correlated with Mean and 1 other fieldsHigh correlation
Stdev is highly correlated with Mean and 1 other fieldsHigh correlation
Mean is highly correlated with Median and 2 other fieldsHigh correlation
Median is highly correlated with Mean and 2 other fieldsHigh correlation
reviewCount is highly correlated with renterTripsTakenHigh correlation
ratedaily is highly correlated with vehiclemakeHigh correlation
vehicletype is highly correlated with vehiclemakeHigh correlation
renterTripsTaken is highly correlated with reviewCountHigh correlation
State_ab is highly correlated with Mean and 2 other fieldsHigh correlation
vehiclemake is highly correlated with ratedaily and 2 other fieldsHigh correlation
fuelType is highly correlated with vehiclemakeHigh correlation
Stdev is highly correlated with Mean and 2 other fieldsHigh correlation
vehicletype is highly correlated with locationcountryHigh correlation
locationcountry is highly correlated with vehicletype and 3 other fieldsHigh correlation
State_ab is highly correlated with locationcountryHigh correlation
vehiclemake is highly correlated with locationcountry and 1 other fieldsHigh correlation
fuelType is highly correlated with locationcountry and 1 other fieldsHigh correlation
fuelType has 75 (1.3%) missing values Missing
rating has 501 (8.6%) missing values Missing
renterTripsTaken has 431 (7.4%) zeros Zeros
reviewCount has 501 (8.6%) zeros Zeros

Reproduction

Analysis started2021-10-12 20:48:47.841235
Analysis finished2021-10-12 20:49:02.310394
Duration14.47 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

State_ab
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct46
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size45.8 KiB
CA
966 
FL
836 
TX
499 
CO
 
238
NV
 
233
Other values (41)
3079 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters11702
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowWA
2nd rowNM
3rd rowNM
4th rowNM
5th rowNM

Common Values

ValueCountFrequency (%)
CA966
16.5%
FL836
14.3%
TX499
 
8.5%
CO238
 
4.1%
NV233
 
4.0%
GA230
 
3.9%
AZ223
 
3.8%
NC219
 
3.7%
NJ211
 
3.6%
HI200
 
3.4%
Other values (36)1996
34.1%

Length

2021-10-12T16:49:02.581670image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ca966
16.5%
fl836
14.3%
tx499
 
8.5%
co238
 
4.1%
nv233
 
4.0%
ga230
 
3.9%
az223
 
3.8%
nc219
 
3.7%
nj211
 
3.6%
hi200
 
3.4%
Other values (36)1996
34.1%

Most occurring characters

ValueCountFrequency (%)
A1984
17.0%
C1506
12.9%
L1032
8.8%
N1001
8.6%
T839
 
7.2%
F836
 
7.1%
O609
 
5.2%
I539
 
4.6%
X499
 
4.3%
M406
 
3.5%
Other values (14)2451
20.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter11702
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A1984
17.0%
C1506
12.9%
L1032
8.8%
N1001
8.6%
T839
 
7.2%
F836
 
7.1%
O609
 
5.2%
I539
 
4.6%
X499
 
4.3%
M406
 
3.5%
Other values (14)2451
20.9%

Most occurring scripts

ValueCountFrequency (%)
Latin11702
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A1984
17.0%
C1506
12.9%
L1032
8.8%
N1001
8.6%
T839
 
7.2%
F836
 
7.1%
O609
 
5.2%
I539
 
4.6%
X499
 
4.3%
M406
 
3.5%
Other values (14)2451
20.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII11702
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A1984
17.0%
C1506
12.9%
L1032
8.8%
N1001
8.6%
T839
 
7.2%
F836
 
7.1%
O609
 
5.2%
I539
 
4.6%
X499
 
4.3%
M406
 
3.5%
Other values (14)2451
20.9%

fuelType
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct4
Distinct (%)0.1%
Missing75
Missing (%)1.3%
Memory size45.8 KiB
GASOLINE
4810 
ELECTRIC
622 
HYBRID
 
274
DIESEL
 
70

Length

Max length8
Median length8
Mean length7.880886427
Min length6

Characters and Unicode

Total characters45520
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowELECTRIC
2nd rowELECTRIC
3rd rowHYBRID
4th rowGASOLINE
5th rowGASOLINE

Common Values

ValueCountFrequency (%)
GASOLINE4810
82.2%
ELECTRIC622
 
10.6%
HYBRID274
 
4.7%
DIESEL70
 
1.2%
(Missing)75
 
1.3%

Length

2021-10-12T16:49:02.833996image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-12T16:49:02.928746image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
gasoline4810
83.3%
electric622
 
10.8%
hybrid274
 
4.7%
diesel70
 
1.2%

Most occurring characters

ValueCountFrequency (%)
E6194
13.6%
I5776
12.7%
L5502
12.1%
S4880
10.7%
G4810
10.6%
A4810
10.6%
O4810
10.6%
N4810
10.6%
C1244
 
2.7%
R896
 
2.0%
Other values (5)1788
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter45520
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E6194
13.6%
I5776
12.7%
L5502
12.1%
S4880
10.7%
G4810
10.6%
A4810
10.6%
O4810
10.6%
N4810
10.6%
C1244
 
2.7%
R896
 
2.0%
Other values (5)1788
 
3.9%

Most occurring scripts

ValueCountFrequency (%)
Latin45520
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
E6194
13.6%
I5776
12.7%
L5502
12.1%
S4880
10.7%
G4810
10.6%
A4810
10.6%
O4810
10.6%
N4810
10.6%
C1244
 
2.7%
R896
 
2.0%
Other values (5)1788
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII45520
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E6194
13.6%
I5776
12.7%
L5502
12.1%
S4880
10.7%
G4810
10.6%
A4810
10.6%
O4810
10.6%
N4810
10.6%
C1244
 
2.7%
R896
 
2.0%
Other values (5)1788
 
3.9%

rating
Real number (ℝ≥0)

MISSING

Distinct80
Distinct (%)1.5%
Missing501
Missing (%)8.6%
Infinite0
Infinite (%)0.0%
Mean4.920325234
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size45.8 KiB
2021-10-12T16:49:03.066342image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4.67
Q14.9
median5
Q35
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)0.1

Descriptive statistics

Standard deviation0.182425257
Coefficient of variation (CV)0.0370758534
Kurtosis132.1368328
Mean4.920325234
Median Absolute Deviation (MAD)0
Skewness-8.54633606
Sum26323.74
Variance0.03327897441
MonotonicityNot monotonic
2021-10-12T16:49:03.223921image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
52800
47.9%
4.97181
 
3.1%
4.96160
 
2.7%
4.98140
 
2.4%
4.94140
 
2.4%
4.92138
 
2.4%
4.95134
 
2.3%
4.93103
 
1.8%
4.9100
 
1.7%
4.9194
 
1.6%
Other values (70)1360
23.2%
(Missing)501
 
8.6%
ValueCountFrequency (%)
12
 
< 0.1%
1.51
 
< 0.1%
21
 
< 0.1%
34
 
0.1%
3.251
 
< 0.1%
3.53
 
0.1%
3.561
 
< 0.1%
3.675
 
0.1%
3.861
 
< 0.1%
423
0.4%
ValueCountFrequency (%)
52800
47.9%
4.9971
 
1.2%
4.98140
 
2.4%
4.97181
 
3.1%
4.96160
 
2.7%
4.95134
 
2.3%
4.94140
 
2.4%
4.93103
 
1.8%
4.92138
 
2.4%
4.9194
 
1.6%

renterTripsTaken
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct238
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.4773543
Minimum0
Maximum395
Zeros431
Zeros (%)7.4%
Negative0
Negative (%)0.0%
Memory size45.8 KiB
2021-10-12T16:49:03.406433image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15
median18
Q346
95-th percentile117
Maximum395
Range395
Interquartile range (IQR)41

Descriptive statistics

Standard deviation41.89895405
Coefficient of variation (CV)1.25156109
Kurtosis9.04885703
Mean33.4773543
Median Absolute Deviation (MAD)16
Skewness2.495920239
Sum195876
Variance1755.52235
MonotonicityNot monotonic
2021-10-12T16:49:03.555036image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0431
 
7.4%
1292
 
5.0%
3214
 
3.7%
2212
 
3.6%
4173
 
3.0%
5162
 
2.8%
6144
 
2.5%
7142
 
2.4%
9138
 
2.4%
8122
 
2.1%
Other values (228)3821
65.3%
ValueCountFrequency (%)
0431
7.4%
1292
5.0%
2212
3.6%
3214
3.7%
4173
3.0%
5162
 
2.8%
6144
 
2.5%
7142
 
2.4%
8122
 
2.1%
9138
 
2.4%
ValueCountFrequency (%)
3951
< 0.1%
3701
< 0.1%
3471
< 0.1%
3461
< 0.1%
3331
< 0.1%
3251
< 0.1%
2971
< 0.1%
2961
< 0.1%
2951
< 0.1%
2921
< 0.1%

reviewCount
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct203
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.45479405
Minimum0
Maximum321
Zeros501
Zeros (%)8.6%
Negative0
Negative (%)0.0%
Memory size45.8 KiB
2021-10-12T16:49:03.716636image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q14
median16
Q339
95-th percentile99
Maximum321
Range321
Interquartile range (IQR)35

Descriptive statistics

Standard deviation35.13611334
Coefficient of variation (CV)1.234804697
Kurtosis7.769007665
Mean28.45479405
Median Absolute Deviation (MAD)14
Skewness2.350867079
Sum166489
Variance1234.54646
MonotonicityNot monotonic
2021-10-12T16:49:03.857260image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0501
 
8.6%
1324
 
5.5%
3242
 
4.1%
2227
 
3.9%
4190
 
3.2%
6159
 
2.7%
5159
 
2.7%
7156
 
2.7%
8154
 
2.6%
10129
 
2.2%
Other values (193)3610
61.7%
ValueCountFrequency (%)
0501
8.6%
1324
5.5%
2227
3.9%
3242
4.1%
4190
 
3.2%
5159
 
2.7%
6159
 
2.7%
7156
 
2.7%
8154
 
2.6%
9120
 
2.1%
ValueCountFrequency (%)
3211
< 0.1%
2851
< 0.1%
2801
< 0.1%
2621
< 0.1%
2531
< 0.1%
2481
< 0.1%
2461
< 0.1%
2431
< 0.1%
2421
< 0.1%
2371
< 0.1%

locationcity
Categorical

HIGH CARDINALITY

Distinct971
Distinct (%)16.6%
Missing0
Missing (%)0.0%
Memory size45.8 KiB
Las Vegas
 
186
Portland
 
166
San Diego
 
163
Phoenix
 
137
Orlando
 
132
Other values (966)
5067 

Length

Max length30
Median length9
Mean length8.970432405
Min length4

Characters and Unicode

Total characters52486
Distinct characters56
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique423 ?
Unique (%)7.2%

Sample

1st rowSeattle
2nd rowTijeras
3rd rowAlbuquerque
4th rowAlbuquerque
5th rowAlbuquerque

Common Values

ValueCountFrequency (%)
Las Vegas186
 
3.2%
Portland166
 
2.8%
San Diego163
 
2.8%
Phoenix137
 
2.3%
Orlando132
 
2.3%
Austin126
 
2.2%
Miami119
 
2.0%
Honolulu114
 
1.9%
Los Angeles111
 
1.9%
San Antonio96
 
1.6%
Other values (961)4501
76.9%

Length

2021-10-12T16:49:04.318029image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
san390
 
4.9%
city233
 
2.9%
las187
 
2.3%
vegas187
 
2.3%
portland167
 
2.1%
diego164
 
2.1%
beach151
 
1.9%
miami144
 
1.8%
phoenix137
 
1.7%
orlando134
 
1.7%
Other values (953)6094
76.3%

Most occurring characters

ValueCountFrequency (%)
a5228
 
10.0%
e4542
 
8.7%
n4185
 
8.0%
o4049
 
7.7%
i3363
 
6.4%
l3277
 
6.2%
r2660
 
5.1%
t2635
 
5.0%
s2517
 
4.8%
2152
 
4.1%
Other values (46)17878
34.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter42254
80.5%
Uppercase Letter8047
 
15.3%
Space Separator2152
 
4.1%
Other Punctuation29
 
0.1%
Dash Punctuation3
 
< 0.1%
Final Punctuation1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a5228
12.4%
e4542
10.7%
n4185
9.9%
o4049
9.6%
i3363
 
8.0%
l3277
 
7.8%
r2660
 
6.3%
t2635
 
6.2%
s2517
 
6.0%
u1345
 
3.2%
Other values (16)8453
20.0%
Uppercase Letter
ValueCountFrequency (%)
S972
12.1%
C776
 
9.6%
A709
 
8.8%
L637
 
7.9%
P621
 
7.7%
M620
 
7.7%
B476
 
5.9%
D433
 
5.4%
H368
 
4.6%
V283
 
3.5%
Other values (15)2152
26.7%
Other Punctuation
ValueCountFrequency (%)
.24
82.8%
'5
 
17.2%
Space Separator
ValueCountFrequency (%)
2152
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%
Dash Punctuation
ValueCountFrequency (%)
-3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin50301
95.8%
Common2185
 
4.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a5228
 
10.4%
e4542
 
9.0%
n4185
 
8.3%
o4049
 
8.0%
i3363
 
6.7%
l3277
 
6.5%
r2660
 
5.3%
t2635
 
5.2%
s2517
 
5.0%
u1345
 
2.7%
Other values (41)16500
32.8%
Common
ValueCountFrequency (%)
2152
98.5%
.24
 
1.1%
'5
 
0.2%
-3
 
0.1%
1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII52485
> 99.9%
Punctuation1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a5228
 
10.0%
e4542
 
8.7%
n4185
 
8.0%
o4049
 
7.7%
i3363
 
6.4%
l3277
 
6.2%
r2660
 
5.1%
t2635
 
5.0%
s2517
 
4.8%
2152
 
4.1%
Other values (45)17877
34.1%
Punctuation
ValueCountFrequency (%)
1
100.0%

locationcountry
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size45.8 KiB
US
5851 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters11702
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUS
2nd rowUS
3rd rowUS
4th rowUS
5th rowUS

Common Values

ValueCountFrequency (%)
US5851
100.0%

Length

2021-10-12T16:49:04.561347image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-12T16:49:04.640168image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
us5851
100.0%

Most occurring characters

ValueCountFrequency (%)
U5851
50.0%
S5851
50.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter11702
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
U5851
50.0%
S5851
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin11702
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
U5851
50.0%
S5851
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII11702
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
U5851
50.0%
S5851
50.0%

ratedaily
Real number (ℝ≥0)

HIGH CORRELATION

Distinct294
Distinct (%)5.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean93.69150573
Minimum20
Maximum1500
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size45.8 KiB
2021-10-12T16:49:04.722915image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile30
Q145
median69
Q3110
95-th percentile225
Maximum1500
Range1480
Interquartile range (IQR)65

Descriptive statistics

Standard deviation96.08092046
Coefficient of variation (CV)1.025503003
Kurtosis57.63689113
Mean93.69150573
Median Absolute Deviation (MAD)29
Skewness6.041697797
Sum548189
Variance9231.543277
MonotonicityNot monotonic
2021-10-12T16:49:04.864536image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
35162
 
2.8%
45142
 
2.4%
49135
 
2.3%
40129
 
2.2%
50119
 
2.0%
30116
 
2.0%
39114
 
1.9%
55109
 
1.9%
99109
 
1.9%
79103
 
1.8%
Other values (284)4613
78.8%
ValueCountFrequency (%)
2011
 
0.2%
214
 
0.1%
225
 
0.1%
236
 
0.1%
2413
 
0.2%
2546
0.8%
2640
0.7%
2734
0.6%
2839
0.7%
2965
1.1%
ValueCountFrequency (%)
15001
 
< 0.1%
14851
 
< 0.1%
14001
 
< 0.1%
12001
 
< 0.1%
11996
0.1%
9995
0.1%
9901
 
< 0.1%
8991
 
< 0.1%
8001
 
< 0.1%
7994
0.1%

vehiclemake
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct54
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size45.8 KiB
Tesla
598 
Toyota
591 
BMW
456 
Ford
436 
Chevrolet
431 
Other values (49)
3339 

Length

Max length13
Median length5
Mean length6.144248846
Min length3

Characters and Unicode

Total characters35950
Distinct characters47
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowTesla
2nd rowTesla
3rd rowToyota
4th rowFord
5th rowChrysler

Common Values

ValueCountFrequency (%)
Tesla598
 
10.2%
Toyota591
 
10.1%
BMW456
 
7.8%
Ford436
 
7.5%
Chevrolet431
 
7.4%
Mercedes-Benz342
 
5.8%
Nissan291
 
5.0%
Jeep279
 
4.8%
Honda257
 
4.4%
Porsche187
 
3.2%
Other values (44)1983
33.9%

Length

2021-10-12T16:49:05.219587image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
tesla598
 
10.0%
toyota591
 
9.9%
bmw456
 
7.6%
ford436
 
7.3%
chevrolet431
 
7.2%
mercedes-benz391
 
6.5%
nissan291
 
4.9%
jeep279
 
4.7%
honda257
 
4.3%
porsche187
 
3.1%
Other values (46)2056
34.4%

Most occurring characters

ValueCountFrequency (%)
e4496
 
12.5%
o3123
 
8.7%
a3069
 
8.5%
s2273
 
6.3%
r2045
 
5.7%
d1826
 
5.1%
n1526
 
4.2%
l1444
 
4.0%
i1265
 
3.5%
T1239
 
3.4%
Other values (37)13644
38.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter27792
77.3%
Uppercase Letter7636
 
21.2%
Dash Punctuation400
 
1.1%
Space Separator122
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e4496
16.2%
o3123
11.2%
a3069
11.0%
s2273
8.2%
r2045
 
7.4%
d1826
 
6.6%
n1526
 
5.5%
l1444
 
5.2%
i1265
 
4.6%
t1200
 
4.3%
Other values (14)5525
19.9%
Uppercase Letter
ValueCountFrequency (%)
T1239
16.2%
M1125
14.7%
B826
10.8%
C598
7.8%
F499
 
6.5%
W456
 
6.0%
H443
 
5.8%
N338
 
4.4%
J337
 
4.4%
A296
 
3.9%
Other values (11)1479
19.4%
Dash Punctuation
ValueCountFrequency (%)
-400
100.0%
Space Separator
ValueCountFrequency (%)
122
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin35428
98.5%
Common522
 
1.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e4496
 
12.7%
o3123
 
8.8%
a3069
 
8.7%
s2273
 
6.4%
r2045
 
5.8%
d1826
 
5.2%
n1526
 
4.3%
l1444
 
4.1%
i1265
 
3.6%
T1239
 
3.5%
Other values (35)13122
37.0%
Common
ValueCountFrequency (%)
-400
76.6%
122
 
23.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII35950
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e4496
 
12.5%
o3123
 
8.7%
a3069
 
8.5%
s2273
 
6.3%
r2045
 
5.7%
d1826
 
5.1%
n1526
 
4.2%
l1444
 
4.0%
i1265
 
3.5%
T1239
 
3.4%
Other values (37)13644
38.0%

vehiclemodel
Categorical

HIGH CARDINALITY

Distinct526
Distinct (%)9.0%
Missing0
Missing (%)0.0%
Memory size45.8 KiB
Model 3
 
331
Mustang
 
151
Model S
 
130
Wrangler
 
123
Model X
 
114
Other values (521)
5002 

Length

Max length23
Median length7
Mean length6.840369168
Min length1

Characters and Unicode

Total characters40023
Distinct characters63
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique145 ?
Unique (%)2.5%

Sample

1st rowModel X
2nd rowModel X
3rd rowPrius
4th rowMustang
5th rowSebring

Common Values

ValueCountFrequency (%)
Model 3331
 
5.7%
Mustang151
 
2.6%
Model S130
 
2.2%
Wrangler123
 
2.1%
Model X114
 
1.9%
C-Class109
 
1.9%
Corolla100
 
1.7%
3 Series94
 
1.6%
Corvette78
 
1.3%
Camry77
 
1.3%
Other values (516)4544
77.7%

Length

2021-10-12T16:49:05.560678image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
model598
 
7.9%
3445
 
5.9%
series237
 
3.1%
wrangler172
 
2.3%
mustang152
 
2.0%
s131
 
1.7%
x114
 
1.5%
c-class109
 
1.4%
corolla105
 
1.4%
prius100
 
1.3%
Other values (473)5425
71.5%

Most occurring characters

ValueCountFrequency (%)
e3396
 
8.5%
a3387
 
8.5%
r2828
 
7.1%
o2338
 
5.8%
l2072
 
5.2%
s2030
 
5.1%
n1872
 
4.7%
1737
 
4.3%
i1642
 
4.1%
C1502
 
3.8%
Other values (53)17219
43.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter27090
67.7%
Uppercase Letter8176
 
20.4%
Decimal Number2424
 
6.1%
Space Separator1737
 
4.3%
Dash Punctuation596
 
1.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C1502
18.4%
S1168
14.3%
M982
12.0%
R462
 
5.7%
X438
 
5.4%
E417
 
5.1%
A378
 
4.6%
G365
 
4.5%
F309
 
3.8%
T301
 
3.7%
Other values (16)1854
22.7%
Lowercase Letter
ValueCountFrequency (%)
e3396
12.5%
a3387
12.5%
r2828
10.4%
o2338
8.6%
l2072
7.6%
s2030
7.5%
n1872
 
6.9%
i1642
 
6.1%
t1497
 
5.5%
d1174
 
4.3%
Other values (15)4854
17.9%
Decimal Number
ValueCountFrequency (%)
3599
24.7%
0507
20.9%
5403
16.6%
4261
10.8%
1224
 
9.2%
2106
 
4.4%
896
 
4.0%
695
 
3.9%
786
 
3.5%
947
 
1.9%
Space Separator
ValueCountFrequency (%)
1737
100.0%
Dash Punctuation
ValueCountFrequency (%)
-596
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin35266
88.1%
Common4757
 
11.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e3396
 
9.6%
a3387
 
9.6%
r2828
 
8.0%
o2338
 
6.6%
l2072
 
5.9%
s2030
 
5.8%
n1872
 
5.3%
i1642
 
4.7%
C1502
 
4.3%
t1497
 
4.2%
Other values (41)12702
36.0%
Common
ValueCountFrequency (%)
1737
36.5%
3599
 
12.6%
-596
 
12.5%
0507
 
10.7%
5403
 
8.5%
4261
 
5.5%
1224
 
4.7%
2106
 
2.2%
896
 
2.0%
695
 
2.0%
Other values (2)133
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII40023
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e3396
 
8.5%
a3387
 
8.5%
r2828
 
7.1%
o2338
 
5.8%
l2072
 
5.2%
s2030
 
5.1%
n1872
 
4.7%
1737
 
4.3%
i1642
 
4.1%
C1502
 
3.8%
Other values (53)17219
43.0%

vehicletype
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size45.8 KiB
car
3659 
suv
1714 
minivan
 
232
truck
 
191
van
 
55

Length

Max length7
Median length3
Mean length3.223893352
Min length3

Characters and Unicode

Total characters18863
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsuv
2nd rowsuv
3rd rowcar
4th rowcar
5th rowcar

Common Values

ValueCountFrequency (%)
car3659
62.5%
suv1714
29.3%
minivan232
 
4.0%
truck191
 
3.3%
van55
 
0.9%

Length

2021-10-12T16:49:05.897807image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-12T16:49:06.029452image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
car3659
62.5%
suv1714
29.3%
minivan232
 
4.0%
truck191
 
3.3%
van55
 
0.9%

Most occurring characters

ValueCountFrequency (%)
a3946
20.9%
c3850
20.4%
r3850
20.4%
v2001
10.6%
u1905
10.1%
s1714
9.1%
n519
 
2.8%
i464
 
2.5%
m232
 
1.2%
t191
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter18863
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a3946
20.9%
c3850
20.4%
r3850
20.4%
v2001
10.6%
u1905
10.1%
s1714
9.1%
n519
 
2.8%
i464
 
2.5%
m232
 
1.2%
t191
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Latin18863
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a3946
20.9%
c3850
20.4%
r3850
20.4%
v2001
10.6%
u1905
10.1%
s1714
9.1%
n519
 
2.8%
i464
 
2.5%
m232
 
1.2%
t191
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII18863
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a3946
20.9%
c3850
20.4%
r3850
20.4%
v2001
10.6%
u1905
10.1%
s1714
9.1%
n519
 
2.8%
i464
 
2.5%
m232
 
1.2%
t191
 
1.0%

vehicleyear
Real number (ℝ≥0)

Distinct34
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.340113
Minimum1955
Maximum2020
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size45.8 KiB
2021-10-12T16:49:06.181018image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1955
5-th percentile2009
Q12014
median2016
Q32018
95-th percentile2020
Maximum2020
Range65
Interquartile range (IQR)4

Descriptive statistics

Standard deviation4.050813478
Coefficient of variation (CV)0.002009990002
Kurtosis48.65109398
Mean2015.340113
Median Absolute Deviation (MAD)2
Skewness-4.467397477
Sum11791755
Variance16.40908983
MonotonicityNot monotonic
2021-10-12T16:49:06.338597image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=34)
ValueCountFrequency (%)
2018828
14.2%
2019769
13.1%
2017707
12.1%
2016685
11.7%
2015627
10.7%
2014453
7.7%
2013371
6.3%
2020324
 
5.5%
2012291
 
5.0%
2011238
 
4.1%
Other values (24)558
9.5%
ValueCountFrequency (%)
19551
< 0.1%
19571
< 0.1%
19611
< 0.1%
19652
< 0.1%
19662
< 0.1%
19681
< 0.1%
19691
< 0.1%
19721
< 0.1%
19761
< 0.1%
19791
< 0.1%
ValueCountFrequency (%)
2020324
 
5.5%
2019769
13.1%
2018828
14.2%
2017707
12.1%
2016685
11.7%
2015627
10.7%
2014453
7.7%
2013371
6.3%
2012291
 
5.0%
2011238
 
4.1%

Mean
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct46
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean68543.64135
Minimum48924.00312
Maximum90668.42188
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size45.8 KiB
2021-10-12T16:49:06.512165image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum48924.00312
5-th percentile56271.95246
Q160887.88983
median65407.01043
Q378126.7378
95-th percentile87689.6041
Maximum90668.42188
Range41744.41876
Interquartile range (IQR)17238.84798

Descriptive statistics

Standard deviation9066.102436
Coefficient of variation (CV)0.1322675927
Kurtosis-0.7781630751
Mean68543.64135
Median Absolute Deviation (MAD)6725.204125
Skewness0.4808441228
Sum401048845.5
Variance82194213.38
MonotonicityNot monotonic
2021-10-12T16:49:06.674699image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
78126.7378966
16.5%
60887.88983836
14.3%
65407.01043499
 
8.5%
73322.82129238
 
4.1%
65684.12075233
 
4.0%
60354.3912230
 
3.9%
62578.07131223
 
3.8%
57750.10492219
 
3.7%
88657.64414211
 
3.6%
77859.58696200
 
3.4%
Other values (36)1996
34.1%
ValueCountFrequency (%)
48924.003124
 
0.1%
52060.361764
 
0.1%
52291.995974
 
0.1%
53612.9258624
 
0.4%
55121.3312551
0.9%
55250.1629829
 
0.5%
55432.2641937
 
0.6%
55715.2483748
0.8%
56211.4429528
 
0.5%
56271.95246118
2.0%
ValueCountFrequency (%)
90668.4218812
 
0.2%
89227.2197223
 
0.4%
88657.64414211
 
3.6%
87689.604172
 
1.2%
84878.6835846
 
0.8%
79401.74013145
 
2.5%
78126.7378966
16.5%
77859.58696200
 
3.4%
77670.2095245
 
0.8%
76113.503825
 
0.1%

Median
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct46
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean88652.61973
Minimum52381.60588
Maximum125685.83
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size45.8 KiB
2021-10-12T16:49:06.815354image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum52381.60588
5-th percentile68207.05664
Q175203.75677
median85856.51304
Q3100581.7329
95-th percentile113988.2321
Maximum125685.83
Range73304.22407
Interquartile range (IQR)25377.97615

Descriptive statistics

Standard deviation15009.00471
Coefficient of variation (CV)0.1693013106
Kurtosis-0.3603325534
Mean88652.61973
Median Absolute Deviation (MAD)13656.40901
Skewness0.40347098
Sum518706478
Variance225270222.4
MonotonicityNot monotonic
2021-10-12T16:49:06.952956image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
100581.7329966
16.5%
75203.75677836
14.3%
85856.51304499
 
8.5%
99512.92205238
 
4.1%
90228.09811233
 
4.0%
71669.21149230
 
3.9%
79373.54457223
 
3.8%
70692.51148219
 
3.7%
125685.83211
 
3.6%
95679.36232200
 
3.4%
Other values (36)1996
34.1%
ValueCountFrequency (%)
52381.605884
 
0.1%
57423.012464
 
0.1%
59831.2250537
 
0.6%
61972.6666751
0.9%
62771.2414424
 
0.4%
63566.334684
 
0.1%
64374.350129
 
0.5%
65054.75082118
2.0%
66040.7133813
 
0.2%
68207.0566448
0.8%
ValueCountFrequency (%)
125685.83211
 
3.6%
120557.349323
 
0.4%
118008.038846
 
0.8%
113988.232172
 
1.2%
109378.0805198
 
3.4%
108911.695245
 
0.8%
105751.981716
 
0.3%
103971.8637145
 
2.5%
101146.00765
 
0.1%
100581.7329966
16.5%

Stdev
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct46
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48071.81197
Minimum39858.7757
Maximum62519.21875
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size45.8 KiB
2021-10-12T16:49:07.120507image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum39858.7757
5-th percentile42670.00164
Q144747.91321
median46045.53391
Q352302.68841
95-th percentile54828.65529
Maximum62519.21875
Range22660.44305
Interquartile range (IQR)7554.775198

Descriptive statistics

Standard deviation4386.390204
Coefficient of variation (CV)0.09124661676
Kurtosis-0.6397604627
Mean48071.81197
Median Absolute Deviation (MAD)2717.886772
Skewness0.6371134389
Sum281268171.8
Variance19240419.02
MonotonicityNot monotonic
2021-10-12T16:49:07.271919image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
53652.97896966
16.5%
45366.67249836
14.3%
46045.53391499
 
8.5%
49529.67681238
 
4.1%
44747.91321233
 
4.0%
44740.78484230
 
3.9%
43626.81686223
 
3.8%
43425.34208219
 
3.7%
57617.36261211
 
3.6%
52302.68841200
 
3.4%
Other values (36)1996
34.1%
ValueCountFrequency (%)
39858.77574
 
0.1%
40817.958824
 
0.1%
41619.2485749
0.8%
41725.6387824
 
0.4%
42072.8224438
0.6%
42122.9463128
0.5%
42184.1549329
0.5%
42231.504034
 
0.1%
42322.2645851
0.9%
42550.5451461
1.0%
ValueCountFrequency (%)
62519.2187512
 
0.2%
58260.8671646
 
0.8%
58091.3239423
 
0.4%
57617.36261211
 
3.6%
54828.6552972
 
1.2%
53652.97896966
16.5%
5237716
 
0.3%
52302.68841200
 
3.4%
52030.847335
 
0.1%
51760.0666745
 
0.8%

Interactions

2021-10-12T16:48:51.498430image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:51.683902image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:52.016014image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:52.200522image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:52.363087image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:52.517706image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:52.661321image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:52.820863image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:52.966473image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:53.125049image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:53.293599image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:53.446222image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:53.611780image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:53.767364image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:53.912970image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:54.071552image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:54.226725image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:54.380346image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:54.530482image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:54.670141image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:54.812760image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:54.956344image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:55.172797image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:55.328351image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:55.463989image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:55.620569image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:55.776184image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:55.922793image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:56.076351image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:56.226947image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:56.367573image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:56.520164image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:56.656799image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:56.807430image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:56.959895image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:57.107532image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:57.258099image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:57.408696image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:57.548358image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:57.697953image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:57.837584image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:57.982194image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:58.124817image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:58.267401image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:58.405034image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:58.535715image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:58.680325image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:58.818961image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:58.941599image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:59.099178image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:59.260776image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:59.410377image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:59.562937image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:59.816295image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:48:59.955886image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:49:00.127430image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:49:00.271044image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:49:00.413665image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:49:00.552327image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:49:00.682973image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:49:00.815621image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:49:00.949231image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:49:01.073939image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-10-12T16:49:01.228520image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-10-12T16:49:07.405561image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-10-12T16:49:07.613039image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-10-12T16:49:07.815465image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-10-12T16:49:08.075980image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-10-12T16:49:08.309357image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-10-12T16:49:01.468874image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-10-12T16:49:01.816947image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-10-12T16:49:02.029115image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-10-12T16:49:02.147829image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

State_abfuelTyperatingrenterTripsTakenreviewCountlocationcitylocationcountryratedailyvehiclemakevehiclemodelvehicletypevehicleyearMeanMedianStdev
0WAELECTRIC5.001312SeattleUS135TeslaModel Xsuv201974067.36191999918.0915749940.72093
1NMELECTRIC5.0021TijerasUS190TeslaModel Xsuv201857127.07500082390.7625043344.75000
2NMHYBRID4.922824AlbuquerqueUS35ToyotaPriuscar201257127.07500082390.7625043344.75000
3NMGASOLINE5.002120AlbuquerqueUS75FordMustangcar201857127.07500082390.7625043344.75000
4NMGASOLINE5.0031AlbuquerqueUS47ChryslerSebringcar201057127.07500082390.7625043344.75000
5NMGASOLINE5.001312AlbuquerqueUS58Mercedes-BenzGL-Classsuv201257127.07500082390.7625043344.75000
6NMGASOLINE4.421312AlbuquerqueUS42GMCYukon XLsuv200557127.07500082390.7625043344.75000
7NMGASOLINE4.901210AlbuquerqueUS117FordExpeditionsuv201857127.07500082390.7625043344.75000
8NMGASOLINE5.0011AlbuquerqueUS102FordFocus RScar201657127.07500082390.7625043344.75000
9NMGASOLINE4.762217AlbuquerqueUS49FordEcoSportsuv201857127.07500082390.7625043344.75000

Last rows

State_abfuelTyperatingrenterTripsTakenreviewCountlocationcitylocationcountryratedailyvehiclemakevehiclemodelvehicletypevehicleyearMeanMedianStdev
5841HIGASOLINENaN00Schofield BarracksUS165Land RoverRange Rover Velarsuv201877859.58695795679.36231952302.688406
5842HIGASOLINE5.0076MililaniUS51AcuraRDXsuv200877859.58695795679.36231952302.688406
5843HIGASOLINE5.0033WaipahuUS69FordTransit Vanvan201577859.58695795679.36231952302.688406
5844HIGASOLINE4.882724HonoluluUS68LexusIS 250 Ccar201177859.58695795679.36231952302.688406
5845HIGASOLINE5.0022MililaniUS50NissanRoguesuv201477859.58695795679.36231952302.688406
5846HIGASOLINE5.003227HonoluluUS33ChevroletCruzecar201777859.58695795679.36231952302.688406
5847HIHYBRID5.001716AieaUS49LexusHS 250hcar201077859.58695795679.36231952302.688406
5848HIGASOLINE4.941817KailuaUS35smartfortwocar201377859.58695795679.36231952302.688406
5849HIGASOLINENaN10WaipahuUS77GMCSavanavan201577859.58695795679.36231952302.688406
5850HIGASOLINE5.001614KailuaUS35smartfortwocar201377859.58695795679.36231952302.688406

Duplicate rows

Most frequently occurring

State_abfuelTyperatingrenterTripsTakenreviewCountlocationcitylocationcountryratedailyvehiclemakevehiclemodelvehicletypevehicleyearMeanMedianStdev# duplicates
1TXGASOLINE5.0043BedfordUS35NissanSentracar201965407.01043585856.51304346045.5339133
0MNGASOLINE4.88138BloomingtonUS39FordEscapesuv201271403.99844088461.72542949469.2137292